Acoustic observation context modeling in segment based speech recognition

نویسندگان

  • Máté Szarvas
  • Shoichi Matsunaga
چکیده

This paper describes a novel method that models the correlation between acoustic observations in contiguous speech segments. The basic idea behind the method is that acoustic observations are conditioned not only on the phonetic context but also on the preceding acoustic segment observation. The correlation between consecutive acoustic observations is modeled by polynomial mean trajectory segment models. This method is an extension of conventional segment modeling approaches in that it not only describes the correlation of acoustic observations inside segments but also between contiguous segments. It is also a generalization of phonetic context (e.g.,triphone) modeling approaches because it can model acoustic context and phonetic context at the same time. In a speaker-independent phoneme classi cation test, using the proposed method resulted in a 7{9% reduction in error rate as compared to the traditional triphone segmental model system and a 31% reduction as compared to a similar triphone HMM (hidden Markov model) system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Allophone-based acoustic modeling for Persian phoneme recognition

Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...

متن کامل

Acoustic Modeling Improvements in a Segment-based Speech Recognizer

In this paper we report on some recent improvements on the acoustic modeling in a segment-based speech recognition system. Context-dependent segment models and improved pronunciation modeling are shown to reduce word error rates in a telephone-based, conversational system by over 18%, while the technique of Gaussian selection reduces overall computation by more than a factor of two.

متن کامل

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...

متن کامل

Unifying HMM and phone-pair segment models

It is well known that HMM is ineffective in modeling the dynamics of speech due to the piecewise stationary and the independent observation assumptions. In this paper, we propose an analytically tractable framework in which the two modeling techniques are combined to reach a jointly optimal decision in both training and recognition. The combination is achieved by coupling the hidden processes f...

متن کامل

Improved Bayesian Training for Context-Dependent Modeling in Continuous Persian Speech Recognition

Context-dependent modeling is a widely used technique for better phone modeling in continuous speech recognition. While different types of context-dependent models have been used, triphones have been known as the most effective ones. In this paper, a Maximum a Posteriori (MAP) estimation approach has been used to estimate the parameters of the untied triphone model set used in data-driven clust...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998